Back

in silico Plants

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match in silico Plants's content profile, based on 24 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
A Large Yield Model for Crop Production and Design in Western Canada

Ubbens, J.; Loliencar, P.; Kagale, S.

2026-04-11 bioinformatics 10.64898/2026.04.08.717277 medRxiv
Top 0.1%
68.4%
Show abstract

With a changing climate, disease pressure, and other production threats, it is critical to ensure that crop producers are well-positioned to protect and optimize yields. In this work we present LYM-1, the first large-scale, multi-crop model for the prediction of yield performance in the Canadian prairies. This is enabled by a large dataset containing over 4.7 million yield observations across 10 different crop types, distributed over 23 growing years. Leveraging additional data sources for weather and soil properties allows the model to reason about the complex interactions between genetics, environment, and management which underlie yield. The trained model is not only effective at predicting the yield for held-out data, but also reveals scientifically and agronomically relevant effects such as the interaction between solar radiation and nitrogen uptake. We anticipate that large yield models can be used for both the optimization of crop production by producers, as well as by plant breeders and industry for crop design.

2
ArchiCrop: a 3D+t architectural model driven by crop model dynamics

Braud, O.; Vezy, R.; Arsouze, T.; Jaeger, M.; Adam, M.; Pradal, C.

2026-04-09 plant biology 10.64898/2026.04.07.716970 medRxiv
Top 0.1%
52.7%
Show abstract

Evolving agricultural practices and contexts invite to reconsider the way crop and plant models represent agroecosystem processes. Crop models assume spatial homogeneity, which reduces confidence in their predictions for structurally heterogeneous systems, whereas FSPMs complexity limits their application to field-scale or large-scale studies. To benefit from strengths of both approaches, we introduce ArchiCrop, a parametric 3D architectural model of cereals that generates plant geometries constrained by crop model dynamics and coordination rules. Inspired by the concept of equifinality, ArchiCrop generates a morphospace of architecturally diverse morphotypes which remain equivalent at crop scale in terms of LAI and height. This multiscale approach enables the comparison of processes computed at different scales on wheat, rice, maize and sorghum. We demonstrate its application evaluating light interception Beers formalism in STICS soil-crop model relying on the leaf-resolved radiosity model Caribu for the 3D reference simulations, for a sorghum monocrop. We show that the consideration of the variability of only two plant architectural traits, leaf insertion angle and leaf number, introduces up to 27% of uncertainty in the cumulated absorbed light at the end of the season. A possible outcome from this method is also the definition of metamodels for crop model processes, as exemplified for extinction coefficient of Beers law. ArchiCrop can support a range of applications, including crop model uncertainty analysis, model-assisted phenotyping, and ideotype design. HighlightsO_LIArchiCrop is the first 3D+t botany-based parametric generative model for cereals. C_LIO_LIArchiCrop downscales crop model dynamics to 3D+t architecture canopies efficiently. C_LIO_LIArchiCrop compares big leaf versus leaf-resolved light interception. C_LIO_LIPlant architectures with same leaf area intercept light with up to 27% variability. C_LIO_LIArchiCrop helps ideotyping, crop model evaluation and error propagation analysis. C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=105 SRC="FIGDIR/small/716970v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@6534b3org.highwire.dtl.DTLVardef@66df41org.highwire.dtl.DTLVardef@1cb4c9dorg.highwire.dtl.DTLVardef@12fa30_HPS_FORMAT_FIGEXP M_FIG C_FIG

3
Kinetic model of a determinate legume root nodule reveals plant metabolic characteristics for more efficient nitrogen fixation symbiosis

Ji, R.; Kaste, J. A. M.; Matthews, M. L.

2026-05-01 plant biology 10.64898/2026.04.28.721409 medRxiv
Top 0.1%
42.0%
Show abstract

While nitrogen fertilizers are widely used in agricultural production, their application incurs significant environmental and energetic costs. In contrast, some crops are less dependent on these fertilizers because they engage in symbioses with rhizobia, nitrogen-fixing bacteria provide ammonium to the plant in exchange for carbon. However, the carbon cost associated with nitrogen fixation can negatively impact crop yields. Improving the efficiency of this metabolic process could alleviate this impact on crop productivity. Mathematical models can help us quantitatively explore metabolic behavior and identify potential targets for metabolic engineering. In this work, we developed a kinetic model of determinate root nodule metabolism, where this symbiotic exchange of carbon from the plant and nitrogen from the bacteria occurs. We used this model to evaluate how the predicted metabolic behavior differs between inefficient and efficient nodules, and to identify potential engineering targets for improving nitrogen fixation efficiency and rate. We show that the enzymes phosphoenolpyruvate carboxylase and pyruvate kinase have significant influence on the predicted rate and efficiency of nitrogen fixation, especially when their expression is varied in combination with oxidative Pentose Phosphate Pathway enzymes like glucose-6-phosphate dehydrogenase and 6-phosphogluconolactonase. The model predicts that pairing a 3-fold decrease in glucose-6-phosphate dehydrogenase activity along with either a 3-fold increase in phosphoenolpyruvate carboxylase activity or decrease in pyruvate kinase activity could increase nitrogen fixation rate by 5.51% while improving nitrogen fixation efficiency by 7.74%.

4
OpenAlea.HydroRoot: A modelling framework to dissect, predict and phenotype branched root hydraulic architecture

Bauget, F.; Ndour, A.; Boursiac, Y.; Maurel, C.; Laplaze, L.; Lucas, M.; Pradal, C.

2026-03-23 plant biology 10.64898/2026.03.19.713025 medRxiv
Top 0.1%
32.9%
Show abstract

Drought is a significant factor in agricultural losses, making it imperative to understand how root system architecture (RSA) adapts to environmental condition like water deficit. HydroRoot is a functional-structural plant model (FSPM) aimed at analyzing and simulating hydraulic and solute transport of RSA. The model integrates a static hydraulic solver, a coupled water-solute transport solver, a statistical generator of RSA based on Markov model, and a dynamic hydraulic model accounting for root growth. This paper presents the model, the mathematical description of the formalism of solvers, and use cases with their associated tutorials. Five use cases illustrate capabilities of HydroRoot, which has been successfully used for phenotyping root hydraulics across various species, including Arabidopsis, maize, and millet. The model-driven phenotyping method "cut and flow" is presented to characterize axial and radial conductivities on a given root genotype. Finally, three step-by-step tutorials provide a structured way to learn how to use HydroRoot 1) to simulate hydraulic on a given architecture, 2) to simulate water and solute transport on a maize root, and 3) to simulate hydraulic on two pearl millet genotypes with varying soil conditions. Hydroroot is an open-source package of the OpenAlea platform, with the code publicly available on Github. A comprehensive documentation is available with a reproducible gallery of examples.

5
Improved model representation of the photosynthetic light reactions reduces estimates of global gross primary productivity

Lamour, J.; Chave, J.; Johnson, J.; Berry, J.; Davidson, K. J.; Ely, K. S.; Fang, L.; Koven, C. D.; Needham, J. F.; Niinemets, U.; Perez, R. P. A.; Schmiege, S. C.; Zhihong, S.; Way, D. A.; Rogers, A.

2026-05-12 plant biology 10.64898/2026.05.08.723728 medRxiv
Top 0.1%
27.7%
Show abstract

The assimilation of carbon dioxide by plants can be predicted by the Farquhar, von Caemmerer and Berry model of photosynthesis. This largely mechanistic model is central to understanding how plants influence Earths climate. However, it represents the use of light by photosynthesis using an empirical formulation. Johnson and Berry proposed an alternative mechanistic formulation based on the functioning of the cytochrome b6f complex that includes key steps in light harvesting and electron transport. We compared both formulations using photosynthetic light response measurements from 146 C3 species spanning arctic to tropical biomes and implemented them in the terrestrial biosphere model ELM-FATES to simulate global photosynthesis. The Johnson and Berry formulation better fitted the measured response of leaf-level photosynthesis to light, and predicted lower photosynthetic rates at intermediate light levels, which decreased global estimations of terrestrial photosynthesis by 8%. Our findings support adopting the Johnson and Berry formulation to improve model representation of global carbon cycle modeling.

6
Dissecting the Network Architecture of a Plant Circadian Clock Model: Identifying Key Regulatory Mechanisms and Essential Interactions

Singh, S. K.; Srivastava, A.

2026-03-18 systems biology 10.64898/2026.03.15.711848 medRxiv
Top 0.1%
22.8%
Show abstract

Circadian rhythms are self-sustained biological oscillations that coordinate diverse physiological processes in plants, including growth, metabolism, and environmental responses. These rhythms arise from an interconnected transcriptional translational feedback network that integrates multiple entrainment cues such as light and temperature. The plant circadian clock is organized around key regulatory loops involving CCA1, LHY, PRRs, TOC1, ELF4, LUX, and other transcriptional regulators, whose coordinated interactions ensure precise and robust oscillations. In this study, we developed an ordinary differential equation based mathematical model, building upon a previous framework to incorporate additional regulatory modules and transcriptional controls that better reflect experimentally observed behaviour. To elucidate the regulatory organization of this model, we performed a multi-layered computational analysis combining four complementary approaches: (i) period sensitivity analysis to quantify how parameter perturbations influence the systems timing, (ii) phase portrait analysis to visualize dynamic interactions among key components, (iii) knockout analysis to identify parameters essential for sustained rhythmicity, and (iv) network impact analysis using composite weighted network indices to evaluate hierarchical control across the network. Together, these analyses reveal that transcriptional repression, protein degradation, and light-regulated synthesis form the dominant control mechanisms within the circadian system. The results highlight a hierarchical and robust network structure centred on the CCA1/LHY and PRRs feedback loop, with redundant modules ensuring stability under perturbations. Thus, this model provides an improved, biologically consistent framework for dissecting the dynamic architecture of the plant circadian clock and guiding future experimental validation.

7
Quantifying the effect of cereal plant trait plasticity on weed suppression in intercrops

Kottelenberg, D. B.; Morales, A.; Anten, N. P. R.; Bastiaans, L.; Evers, J. B.

2026-04-03 plant biology 10.64898/2026.04.01.715874 medRxiv
Top 0.1%
22.0%
Show abstract

In cereal-legume intercrops, weed suppression is primarily driven by cereals, whose competitiveness is shaped by trait plasticity--morphological adjustments in response to the intercrop environment. However, how individual cereal traits respond plastically and contribute to system performance remains unclear, hampering improvements through breeding or system design. We combined field experiments with functional-structural plant modelling to quantify plastic responses of four cereal traits (tiller number, tiller angle, specific leaf area (SLA), and specific internode length (SIL)) and their effects on weed suppression and crop productivity. Field measurements revealed plasticity in tiller number, tiller angle, and SIL between sole crops and intercrops, while SLA showed minimal differences. Simulations showed that intermediate tiller numbers resulted in the strongest weed suppression and highest productivity, indicating an optimum, while more horizontal tillers suppressed weeds slightly better than vertical ones. Weed suppression increased with higher SLA values, while SIL showed a saturating response, increasing to intermediate SIL values and plateauing thereafter. In simulations with short-statured cereal phenotypes (low SIL), the reduction in cereal weed suppression was compensated by the legume component. This study demonstrates how FSP modelling can be used to investigate trait plasticity mechanisms and generate testable hypotheses about trait effects in complex intercrop systems. HighlightCereal trait plasticity shapes weed suppression in cereal-legume intercrops, with distinct response patterns per trait, while legumes can compensate for weakly competitive cereals, suggesting balanced competition over cereal dominance.

8
GE-BiCross: A Hierarchical Bidirectional Cross-Attention Framework for Genotype-by-Environment Prediction in Maize

Zhou, S.; Zhao, T.

2026-03-12 bioinformatics 10.64898/2026.03.10.710816 medRxiv
Top 0.1%
14.7%
Show abstract

Genotype-by-environment interactions are central to crop adaptation and yield stability, yet they remain difficult to model for robust prediction across heterogeneous environments. Although enviromic profiling has improved the characterization of dynamic field conditions, most existing genomic prediction methods adopt a late-fusion strategy that encodes genomic and environmental information independently before global integration, thereby limiting their ability to resolve fine-scale, context-dependent G x E effects. Here, we developed GE-BiCross, a hierarchical bidirectional cross-attention framework for maize prediction. GE-BiCross incorporates a dual-path feature extraction module to disentangle independent and cooperative effects, a tokenized bidirectional cross-attention module to enable reciprocal genotype-environment interaction learning, and a mixture-of-experts module to adaptively capture heterogeneous response patterns across environments. Using a large-scale dataset of approximately 360,000 observations from 4,923 maize hybrids evaluated in 241 environments, GE-BiCross consistently outperformed conventional genomic prediction, machine learning, and deep learning baselines across six agronomic traits. The greatest improvements were observed for environmentally responsive and genetically complex traits. In particular, GE-BiCross achieved an R2 of 0.672 for grain yield and 0.880 for grain moisture, significantly surpassing all comparison models. Ablation analyses demonstrated that the three core modules make distinct and complementary contributions to predictive performance.These results show that deep, bidirectional integration of genomic and enviromic information can substantially improve modeling of complex G x E interactions, providing a powerful framework for interpretable genomic prediction and climate-smart crop breeding.

9
Joint modeling of social genetic effects in mono- and pluri-specific groups: case study in intercrops

Salomon, J.; Enjalbert, J.; Flutre, T.

2026-03-31 genetics 10.64898/2026.03.27.714849 medRxiv
Top 0.1%
10.1%
Show abstract

The genetics of interspecific groups remains largely unexplored, despite the central role of social (or indirect) genetic effects in shaping phenotypic expression within communities. Intercropping, i.e. the simultaneous cultivation of multiple crop species in the same field, offers a powerful model to harness these interspecific social effects. Such species mixtures provide well-documented agricultural benefits, yet few breeding frameworks have integrated the genetics of social interactions. Here, we address this gap by extending quantitative genetic theory to interspecific groups, with intercropping as a concrete and applied model case. We propose a quantitative genetic model that jointly analyzes intra and interspecific interactions within a unifying framework. Breeding values are decomposed into a direct component, shared in mono and mixed-crops, an interspecific social component corresponding to the effect of one species on another, and an intraspecific component that captures the social effects within a mono-genotypic stand of cloned plants. Statistically, this consists in simultaneously fitting several linear mixed models, one per stand type, all having direct breeding values in common. As no open-source software can fit such a complex mixed model, we provide such an implementation in R/C++. Simulations across various genetic (co)variance structures and sparse experimental designs showed accurate estimation of all genetic (co)variances and breeding values. With an incomplete, yet balanced design combining sole crops and intercrops, genetic gains in both systems were achievable simultaneously, enabling breeding strategies that progressively integrate intercropping into existing, sole-crop-only schemes. More broadly, this framework allows dissecting direct and social genetic effects when genotypes are observed in mono- and mixed-species situations, cultivated or not.

10
Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv
Top 0.1%
10.0%
Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI

11
Growth under constraints: root tip development controls trade-offs between speed and mechanical efficiency

Dupuy, L. X.; Yao, J.; de las Heras Martinez, G.

2026-05-14 plant biology 10.64898/2026.05.14.724970 medRxiv
Top 0.1%
10.0%
Show abstract

Growth kinematics and soil mechanics are key to explain how roots overcome the mechanical resistance of soil, yet few studies are linking these two factors. Formulas for cone penetration tests are typically used to infer the friction experienced by roots, but these fail to consider how growth affects the external forces applied on the root. This study formalised how expansive growth in the root apical meristem can reduce soil friction, and applied the framework to analyse the growth strategy of 6 plant species. The results of the analysis revealed trade-offs between reducing frictions, maintaining a desired growth trajectory and elongation rate. A shorter elongation zone can reduce the fraction of the mechanical energy lost to friction, but this is done at the expense of the elongation rate. A sharper tip or increased radius can help roots maintain the elongation rate at no energetic cost, but these strategies come with the cost of growth instability (tortuous roots) and decrease in specific root length respectively. During establishment, root strategies may therefore occupy a 2-dimensional trait space in which the mechanical efficiency of growth is balanced against the explorative-exploitative trade-off. HighlightsGrowth and form of root tips explain how plants overcome mechanical resistance from the soil Trade-offs link the energy lost by friction, growth stability and elongation rate of roots Larger roots allow faster growth independently of these trade-offs New framework formalises plants strategies to acquire soil resources

12
Epistatic fitness landscapes emerge from parallel adaptive walks in breeding network metapopulations

Monyak, T.; Morris, G.

2026-03-20 genetics 10.64898/2026.03.18.712732 medRxiv
Top 0.1%
9.1%
Show abstract

Global networks of crop breeding programs leverage diverse germplasm, but diversity increases the complexity of maintaining stability in their elite genepools. To characterize genetic heterogeneity in breeding metapopulations and develop insights on how to manage it, we simulated the evolution of breeding populations on fitness landscapes. We revealed the geometric decrease in the average effect size of alleles segregating as standing variation that become fixed along an adaptive walk. We also demonstrated how independent adaptive walks of subpopulations are influenced by genetic drift, leading to cryptic genetic heterogeneity among elite genepools. This variation is released when elite lines derived from independent subpopulations are crossed, leading to segregation for 2-4X more major QTL in admixed families as in unadmixed families, and 2-4X more epistatic interactions. The emergent property of fitness epistasis for traits under stabilizing selection is well-understood in evolutionary genetics, but under-appreciated in crop quantitative genetics. To highlight the importance of this phenomenon, we constructed an empirical genotype-to-fitness landscape from the sorghum NAM, a global admixed prebreeding resource, demonstrating the utility of fitness landscapes for inferring genetic compatibilities within metapopulations. Our findings suggest that in breeding networks, strategies for effective germplasm exchange must account for epistasis in the oligogenic component of the genetic architecture of locally-adapted traits. Article summaryModern public sector crop improvement happens in networks of breeding programs that routinely exchange genetic information. Traditional models for understanding quantitative traits have limited predictiveness in situations with such genetic heterogeneity. This study uses breeding simulations and empirical data to show the utility of the fitness landscape framework for characterizing the genetic architecture of complex traits in breeding metapopulations. By simulating the evolution of breeding programs and integration into networks, it demonstrates how epistatic interactions between large-effect alleles are a fundamental property that must be accounted for when exchanging germplasm. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=102 SRC="FIGDIR/small/712732v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@1541326org.highwire.dtl.DTLVardef@b553a8org.highwire.dtl.DTLVardef@8758b4org.highwire.dtl.DTLVardef@1d0bdcd_HPS_FORMAT_FIGEXP M_FIG C_FIG

13
BioOS: A Gene-Driven Digital Twin Runtime for Emergent Plant Development

AUGER, E.; Gandecki, M.; Delarche, C.; Heng, F. X.

2026-03-17 bioinformatics 10.64898/2026.03.14.711542 medRxiv
Top 0.1%
8.3%
Show abstract

Predicting plant mutant phenotypes requires models that connect gene regulation to organ-scale morphogenesis without collapsing mechanism into phenomenological rules. We present BioOS, a curated mechanistic runtime built around the Formal Cell, a minimal signal-processing abstraction in which promoter evaluation, transcription, translation, and protein state drive cell division, differentiation, and elongation. A multi-scale architecture combining TissueUnits and FormalCells enables real-time simulation of bounded Arabidopsis thaliana developmental programs while keeping the primary claim anchored in primary-root auxin transport. On the official five-case root-auxin benchmark, BioOS achieves a 75.4% mean score, 5/5 qualitative matches, 5/5 quantitative passes, and Spearman severity correlation {rho} = 0.70. The deployed auxin slice uses a curated 35-gene registry; for readability, this manuscript details an 18-gene core subnetwork. Beyond the primary auxin claim, the same runtime closes official cytokinin (5/5), flowering (5/5), and photosynthesis (7/7) gates, while a candidate root-patterning panel passes 8/8. BioOS should therefore be read as a benchmark-validated runtime for bounded developmental prediction rather than as a single-slice demonstration. O_TEXTBOXKey Resulty ResultsThe following results summarize the current validation status of BioOS on the primary-root auxin slice and its surrounding benchmark framework: O_LI35-gene root-auxin runtime, with an 18-gene core GRN illustrated here C_LIO_LIEmergent division, differentiation, elongation from gene expression C_LIO_LIOfficial auxin panel closed: 5/5 qualitative matches, 5/5 full passes, 75.4% mean score, {rho} = 0.70 severity ranking C_LIO_LIFour official gates closed in one runtime: root_auxin 5/5, cytokinin 5/5, flowering 5/5, photosyn-thesis 7/7 C_LIO_LIBroader benchmark corpus: 6 suites / 63 cases; the 8-case candidate root-patterning panel currently passes 8/8 C_LIO_LIReal-time capable: 175 TissueUnits + 200 cells = 8 ms/tick C_LI C_TEXTBOX

14
PSoup: an R package for simulating biological networks from a qualitative perspective

Fortuna, N. Z.; Lawson, B. A. J.; Mitsanis, C.; Burrage, K.; Beveridge, C. A.

2026-04-22 plant biology 10.64898/2026.04.19.719106 medRxiv
Top 0.1%
6.8%
Show abstract

Mathematical modelling is essential for understanding how complex biological systems respond to genetic, physiological, and environmental changes. Existing approaches, however, often require trade-offs between mechanistic detail, model size, parameter uncertainty, and interpretability. Ordinary differential equation (ODE) models capture biochemical processes with quantitative precision but can demand extensive parameterisation. In contrast, large statistical and machine-learning models rely on substantial datasets and frequently lack mechanistic transparency. Qualitative approaches such as Boolean networks improve scalability but may oversimplify biological behaviour. To address some of these limitations, we present PSoup, an R package that automatically converts knowledge graphs into transparent, parameter-free, qualitative models. PSoup uses algebraic update rules designed around a fixed, biologically interpretable baseline, enabling predictions of relative change across diverse perturbations without requiring kinetic parameters. This design allows PSoup to integrate information across biological scales and from heterogeneous experimental sources. We evaluated PSoup using the well-studied shoot branching network of Bertheloot et al. (2019), which ncorporates hormonal (auxin, strigolactone, cytokinin) and metabolic (sucrose) regulation. Across 78 experimental conditions, PSoup correctly predicted 88.5% of perturbation outcomes, including 89.5% accuracy for unique, biologically consistent comparisons. We further demonstrate how PSoup can distinguish among alternative plausible network topologies, revealing how structural differences influence emergent system behaviour. PSoup offers an intuitive, accessible, and mathematically transparent framework for exploring biological networks. Its capacity to integrate diverse knowledge and test alternative hypotheses positions it as a powerful tool for biological discovery and a valuable complement to existing modelling approaches.

15
Uncovering genetic mechanisms underlying trait variation in switchgrass using explainable artificial intelligence

Izquierdo, P.; Weng, X.; Juenger, T.; Bonnette, J. E.; Yoshinaga, Y.; Daum, C.; Lipzen, A.; Barry, K.; Blow, M. J.; Lehti-Shiu, M. D.; Lowry, D.; Shiu, S.-H.

2026-03-09 genetics 10.64898/2026.03.06.710154 medRxiv
Top 0.1%
4.7%
Show abstract

Uncovering the genetic architecture of quantitative traits is challenging because polygenic control yields small individual gene effects and because gene-gene and genotype-by-environment interactions add further complexity. To understand the genetic basis of polygenic traits and their plasticity across environments, we integrated genome-wide SNPs and RNA-seq transcript data with interpretable statistical and machine learning models in a switchgrass (Panicum virgatum) diversity panel grown at contrasting field sites in Michigan and Texas. Notably, in addition to single environments, our trait prediction models were able to predict phenotypic differences, across environments i.e., plasticity. By interpreting trait prediction models with explainable artificial intelligence methods, we identified important features--genes that are the most predictive of flowering time and annual biomass production across environments, based on their associated gene expression levels and nearby SNPs. This approach recovered canonical flowering regulators and revealed novel, environment-specific candidate flowering genes. Further, transcriptome models consistently recovered more switchgrass genes homologous to experimentally validated genes in Arabidopsis and rice than SNP-based models. Feature interaction scores from the models also allow the identification of trait- and environment-dependent gene-gene interactions, where flowering time showed stronger and more abundant interactions than biomass. While some of the interactions identified are consistent with the link between flowering time and yield, most are novel predictors that need to be further evaluated. Together, these results demonstrate that interpretable genomic prediction with explainable artificial intelligence approaches can convert trait prediction models into mechanistic hypotheses about putative causal genes and interactions controlling traits within and across environments. These results will help to prioritize target genes for validation and inform germplasm selection for cultivar improvement.

16
Crop yields under simulated nuclear winter: a growth chamber experiment

Blouin, S.; Abrams, D. R.; Ben-Zeev, R.; Anderson, C. T.; Lasky, J. R.; Denkenberger, D.

2026-05-07 plant biology 10.64898/2026.05.05.723012 medRxiv
Top 0.1%
3.7%
Show abstract

A global nuclear war could inject soot into the stratosphere, blocking sunlight and causing rapid cooling. Assessments of the resulting agricultural collapse rely on crop models never validated under such conditions. We grew wheat, canola, and potato in growth chambers simulating the light and temperature of an extreme nuclear winter at tropical and temperate sites. In the tropical chamber (18-20 {degrees}C, 200 mol m-2 s-1 PAR), all three crops produced viable yields. Wheat yielded 2.1-2.3 t/ha (n=3 well-watered, n=3 water-stressed pots), 60% of the global average, and single-pot observations of canola and potato suggested biological yields comparable to global averages. In the temperate chamber simulating nuclear winter irradiance (60-360 mol m-2 s-1), wheat stems collapsed under their own weight. Although hand-harvesting recovered 0.6-2.8 t/ha of grain, mechanical field harvest of a flat canopy would recover substantially less. This failure mode was not observed in a higher-light control chamber and is not captured by existing crop models, which may therefore overestimate temperate cereal production under nuclear winter. Canola produced comparable yields under both temperate light regimes without lodging. Empirical screening of additional staples is needed to identify which remain viable under nuclear winter.

17
Robust Random Forests for Genomic Prediction: Challenges and Remedies

Lourenco, V. M.; Ogutu, J. O.; Piepho, H.-P.

2026-04-01 bioinformatics 10.64898/2026.03.30.715203 medRxiv
Top 0.1%
3.5%
Show abstract

Data contamination--from recording errors to extreme outliers--can compromise statistical models by biasing predictions, inflating prediction errors, and, in severe cases, destabilizing performance in high-dimensional settings. Although contamination can affect responses and covariates, we focus on response contamination and evaluate Random Forests through simulation. Using a synthetic animal-breeding dataset, we assess robust Random Forests across several contamination scenarios and validate them on plant and animal datasets. We thereby clarify the consequences of contamination for prediction, develop a robust Random Forest framework, and evaluate its performance. We examine preprocessing or data-transformation strategies, algorithmic modifications, and hybrid approaches for robustifying Random Forests. Across these approaches, data transformation emerges as the most effective strategy, delivering the strongest performance under contamination. This strategy is simple, general, and transferable to other Machine Learning methods, offering a remedy for robust genomic prediction. In real breeding data, robust Random Forests are useful when substantial contamination, phenotypic corruption, misrecording, or train-deployment mismatch is plausible and the goal is to recover a latent signal for genomic prediction and selection; ranking-based robust Random Forests are the dependable first option, whereas weighting-based Random Forests should be used only when their weighting scheme preserves rank structure and improves prediction. Robustification is not universally necessary, but it becomes important when contamination distorts the link between observed responses and the predictive target; standard Random Forests remain the default for clean data, whereas robust Random Forests should be fitted alongside them whenever contamination is plausible, with the final choice guided by data, trait, and breeding objective. Author summaryMachine learning (ML) methods are widely used for prediction with high-dimensional, complex data, and supervised approaches such as Random Forests (RF) have proved effective for genomic prediction (GP) and selection. Yet their performance can be severely compromised by data contamination if the algorithms rely on classical data-driven procedures that are sensitive to atypical observations. Robustifying ML methods is therefore important both for improving predictive performance under contamination and for guiding their practical use in high-dimensional prediction problems. To address this need, we develop robust preprocessing, algorithm-level, and hybrid strategies for improving RF performance with contaminated data. Using simulated animal data, we show that ranking-and weighting-based robust RF provide the strongest overall compromise for genomic prediction and selection under contamination. Validation on several plant and animal breeding datasets further shows that the benefits of robustification are not universal, but depend on the dataset, trait, and breeding objective. Although motivated by RF, the framework we propose is general, practical, and readily transferable to other ML methods. It also offers a basis for deciding when robustness should complement standard RF rather than replace it outright.

18
Dissecting oligogenic and polygenic indirect genetic effects through the lens of neighbor genotypic identity

Sato, Y.; Hamazaki, K.

2026-04-03 genetics 10.64898/2026.03.31.715746 medRxiv
Top 0.1%
3.5%
Show abstract

Individual phenotypes often depend on the genotypes of other individuals within a group. These phenomena are termed indirect genetic effects (IGEs) and have been distinguished from direct genetic effects (DGEs) using quantitative genetic models. Recent studies have utilized high-resolution polymorphism data to enable genomic prediction (GP) and genome-wide association study (GWAS) of IGEs, but unified methods remain limited. Here we integrate polygenic and oligogenic IGEs using a multi-kernel mixed model incorporating two random effects with a single covariance parameter. Underlying this implementation, the Ising model of ferromagnetics enabled us to simplify locus-wise and background IGEs for GWAS and GP, respectively. Our simulations demonstrated that, while the previous and present models exhibited similar performance, the present model can infer a trade-off between DGEs and IGEs. By applying this method to three species of woody plants, we found evidence for intergenotypic competition in aspen and apple trees, but limited evidence in climbing grapevines. Based on GWAS, we also detected significant variants associated with the competitive IGEs on the apple trunk growth. Our study offers a flexible implementation for GWAS/GP of IGEs, thereby providing an effective tool to dissect the genetic architecture of group performance.

19
Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv
Top 0.1%
3.3%
Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

20
A Bayesian approach for identifying similar transcript dynamics using curve registration

Kristianingsih, R.; Calderwood, A.; Sidhu, G.; Woodhouse, S.; Woolfenden, H. C.; Kurup, S.; Wells, R.; Morris, R. J.

2026-04-29 bioinformatics 10.64898/2026.04.26.720911 medRxiv
Top 0.1%
3.0%
Show abstract

Changes in gene expression over time can provide valuable insights into developmental processes and responses to the environment. Differences in expression may be indicative of potential differences in regulation. Comparing transcript dynamics may help identify correspondences between developmental stages within and between species, differences in the timing of key events during development, and transcriptional response to treatments or perturbations. A straightforward comparison between the dynamics is, however, hindered by measurements that were taken at different time points and over different timescales. To address this, we developed a statistical approach that seeks the optimal alignment between two time series as a function of a temporal shift and stretch. We validated our approach using simulated data and applied it to several transcriptome datasets, including comparisons between different plant species. Our development facilitates knowledge transfer from model systems to less studied species, the identification of modules of co-regulated genes, and the discovery of condition-specific, temporally differentially-expressed genes. The method is provided freely available as an R package.